Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares

نویسندگان

  • Garvesh Raskutti
  • Michael W. Mahoney
چکیده

We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an algorithmic perspective, when using sketching matrices constructed from random projections and leverage-score sampling, if the number of samples r much smaller than the original sample size n, then the worst-case (WC) error is the same as solving the original problem, up to a very small relative error. From a statistical perspective, one typically considers the mean-squared error performance of randomized sketching algorithms, when data are generated according to a statistical linear model. In this paper, we provide a rigorous comparison of both perspectives leading to insights on how they differ. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator; and we use our framework to provide upper bounds for several types of random projection and random sampling algorithms. Among other results, we show that the RE can be upper bounded when r is much smaller than n, while the PE typically requires the number of samples r to be substantially larger. Lower bounds developed in subsequent work show that our upper bounds on PE can not be improved. Proceedings of the 32 International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

We consider statistical aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data (X,Y ) ∈ Rn×p × R, where n and p are both large and n p, sketching algorithms use a “sketching matrix,” S ∈ Rr×n, where r n, e.g., a matrix representing the process of random sampling or random projection. Then, rather than solving the LS pro...

متن کامل

Fast and Guaranteed Tensor Decomposition via Sketching

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tenso...

متن کامل

Flexible least squares for temporal data mining and statistical arbitrage

A number of recent emerging applications call for studying data streams, potentially infinite flows of information updated in real-time. When multiple co-evolving data streams are observed, an important task is to determine how these streams depend on each other, accounting for dynamic dependence patterns without imposing any restrictive probabilistic law governing this dependence. In this pape...

متن کامل

Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares

We study randomized sketching methods for approximately solving least-squares problem with a general convex constraint. The quality of a least-squares approximation can be assessed in different ways: either in terms of the value of the quadratic objective function (cost approximation), or in terms of some distance measure between the approximate minimizer and the true minimizer (solution approx...

متن کامل

Comparing the Performance of Least-Squares Estimators: when is GTLS Better than LS?

Several computer vision problems lead to linear systems affected by noise. These are commonly solved by least-squares estimators, the most popular being ordinary least squares (LS), total least squares (TLS) and generalized total least squares (GTLS). However, the statistical or structural assumptions of these theoretical estimators are very often violated in practice. Given that their computat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015